Skip to content

Use dynamic configuration model in bundles#1098

Merged
pietern merged 142 commits into
mainfrom
bundle-use-dyn
Feb 16, 2024
Merged

Use dynamic configuration model in bundles#1098
pietern merged 142 commits into
mainfrom
bundle-use-dyn

Conversation

@pietern
Copy link
Copy Markdown
Contributor

@pietern pietern commented Jan 4, 2024

Changes

This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a dyn.Value. This representation is functionally equivalent to Go's any (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (config.Root). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using ${var.num_workers} for an integer workers field was impossible until now (though not implemented in this change).

The loader for a dyn.Value includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs.

Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see MarkMutatorEntry and MarkMutatorExit in bundle/config/root.go).

Required changes included in this change:

  • The existing interpolation package is removed in favor of libs/dyn/dynvar.
  • Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators.

To be implemented later:

  • Allow variable references for non-string types.
  • Surface diagnostics about the configuration provided by the user in the validation output.
  • Some mutators use a resource's configuration file path to resolve related relative paths. These depend on bundle/config/paths.Path being set and populated through ConfigureConfigFilePath. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job).

Tests

  • Existing unit tests pass (some have been modified to accommodate)
  • Integration tests pass

This change adds:
* A `config.Walk` function to walk a configuration tree
* A `config.Path` type to represent a value's path inside a tree
* Functions to create a `config.Path` from a string, or convert one to a string
github-merge-queue Bot pushed a commit that referenced this pull request Feb 16, 2024
#1217)

## Changes

When resolving a value returned by the lookup function, the code would
call into `resolveRef` with the key that `resolveKey` was called with.
In doing so, it would cache the _new_ ref under that key.

We fix this by caching ref resolution only at the top level and relying
on lookup caching to avoid duplicate work.

This came up while testing #1098.

## Tests

Unit test.
@pietern pietern enabled auto-merge February 16, 2024 19:35
@pietern pietern added this pull request to the merge queue Feb 16, 2024
Merged via the queue into main with commit 87dd46a Feb 16, 2024
@pietern pietern deleted the bundle-use-dyn branch February 16, 2024 19:49
github-merge-queue Bot pushed a commit that referenced this pull request Feb 16, 2024
## Changes

This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.

The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).

Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.

## Tests

* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
andrewnester added a commit that referenced this pull request Feb 20, 2024
CLI:
 * Add support for UC Volumes to the `databricks fs` commands ([#1209](#1209)).

Bundles:
 * Use dynamic configuration model in bundles ([#1098](#1098)).
 * Allow use of variables references in primitive non-string fields ([#1219](#1219)).
 * Add an experimental default-sql template ([#1051](#1051)).
 * Add an experimental dbt-sql template ([#1059](#1059)).

Internal:
 * Add fork-user to winget release workflow ([#1214](#1214)).
 * Use `any` as type for data sources and resources in `tf/schema` ([#1216](#1216)).
 * Avoid infinite recursion when normalizing a recursive type ([#1213](#1213)).
 * Fix issue where interpolating a new ref would rewrite unrelated fields ([#1217](#1217)).
 * Use `dyn.Value` as input to generating Terraform JSON ([#1218](#1218)).

API Changes:
 * Changed `databricks lakehouse-monitors update` command with new required argument order.
 * Added `databricks online-tables` command group.

OpenAPI commit cdd76a98a4fca7008572b3a94427566dd286c63b (2024-02-19)
Dependency updates:
 * Bump Terraform provider to v1.36.2 ([#1215](#1215)).
 * Bump github.com/databricks/databricks-sdk-go from 0.32.0 to 0.33.0 ([#1222](#1222)).
@andrewnester andrewnester mentioned this pull request Feb 20, 2024
github-merge-queue Bot pushed a commit that referenced this pull request Feb 20, 2024
CLI:
* Add support for UC Volumes to the `databricks fs` commands
([#1209](#1209)).

Bundles:
* Use dynamic configuration model in bundles
([#1098](#1098)).
* Allow use of variables references in primitive non-string fields
([#1219](#1219)).
* Add an experimental default-sql template
([#1051](#1051)).
* Add an experimental dbt-sql template
([#1059](#1059)).

Internal:
* Add fork-user to winget release workflow
([#1214](#1214)).
* Use `any` as type for data sources and resources in `tf/schema`
([#1216](#1216)).
* Avoid infinite recursion when normalizing a recursive type
([#1213](#1213)).
* Fix issue where interpolating a new ref would rewrite unrelated fields
([#1217](#1217)).
* Use `dyn.Value` as input to generating Terraform JSON
([#1218](#1218)).

API Changes:
* Changed `databricks lakehouse-monitors update` command with new
required argument order.
 * Added `databricks online-tables` command group.

OpenAPI commit cdd76a98a4fca7008572b3a94427566dd286c63b (2024-02-19)
Dependency updates:
* Bump Terraform provider to v1.36.2
([#1215](#1215)).
* Bump github.com/databricks/databricks-sdk-go from 0.32.0 to 0.33.0
([#1222](#1222)).
denik pushed a commit that referenced this pull request May 20, 2026
## Changes

The approach to do this was:
1. Iterate over all libraries in all job tasks
2. Find references to local libraries
3. Store pointer to `compute.Library` in the matching artifact file to
signal it should be uploaded

This breaks down when introducing #1098 because we can no longer track
unexported state across mutators. The approach in this PR performs the
path matching twice; once in the matching mutator where we check if each
referenced file has an artifacts section, and once during artifact
upload to rewrite the library path from a local file reference to an
absolute Databricks path.

## Tests

Integration tests pass.
denik pushed a commit that referenced this pull request May 20, 2026
## Changes

This reverts commit de5cf73.

The integration test for metadata computation failed. The back and forth
to `dyn.Value` erases unexported fields that the code currently still
depends on. We'll have to retry on top of #1098.
denik pushed a commit that referenced this pull request May 20, 2026
…#1211)

## Changes

This feature supports variable lookups in a `dyn.Value` that are present
in the type but haven't been initialized with a value.

For example: `${bundle.git.origin_url}` is present in the `dyn.Value`
only if it was assigned a value. If it wasn't assigned a value it should
resolve to the empty string. This normalization option, when set,
ensures that all fields that are represented in the specified type are
present in the return value.

This change is in support of #1098.

## Tests

Added unit test.
denik pushed a commit that referenced this pull request May 20, 2026
#1217)

## Changes

When resolving a value returned by the lookup function, the code would
call into `resolveRef` with the key that `resolveKey` was called with.
In doing so, it would cache the _new_ ref under that key.

We fix this by caching ref resolution only at the top level and relying
on lookup caching to avoid duplicate work.

This came up while testing #1098.

## Tests

Unit test.
denik pushed a commit that referenced this pull request May 20, 2026
## Changes

This is a fundamental change to how we load and process bundle
configuration. We now depend on the configuration being represented as a
`dyn.Value`. This representation is functionally equivalent to Go's
`any` (it is variadic) and allows us to capture metadata associated with
a value, such as where it was defined (e.g. file, line, and column). It
also allows us to represent Go's zero values properly (e.g. empty
string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model
deviate from the typed structure we have been relying on so far
(`config.Root`). We need to deviate from these types when using
variables for fields that are not a string themselves. For example,
using `${var.num_workers}` for an integer `workers` field was impossible
until now (though not implemented in this change).

The loader for a `dyn.Value` includes functionality to capture any and
all type mismatches between the user-defined configuration and the
expected types. These mismatches can be surfaced as validation errors in
future PRs.

Given that many mutators expect the typed struct to be the source of
truth, this change converts between the dynamic representation and the
typed representation on mutator entry and exit. Existing mutators can
continue to modify the typed representation and these modifications are
reflected in the dynamic representation (see `MarkMutatorEntry` and
`MarkMutatorExit` in `bundle/config/root.go`).

Required changes included in this change:
* The existing interpolation package is removed in favor of
`libs/dyn/dynvar`.
* Functionality to merge job clusters, job tasks, and pipeline clusters
are now all broken out into their own mutators.

To be implemented later:
* Allow variable references for non-string types.
* Surface diagnostics about the configuration provided by the user in
the validation output.
* Some mutators use a resource's configuration file path to resolve
related relative paths. These depend on `bundle/config/paths.Path` being
set and populated through `ConfigureConfigFilePath`. Instead, they
should interact with the dynamically typed configuration directly. Doing
this also unlocks being able to differentiate different base paths used
within a job (e.g. a task override with a relative path defined in a
directory other than the base job).

## Tests

* Existing unit tests pass (some have been modified to accommodate)
* Integration tests pass
denik pushed a commit that referenced this pull request May 20, 2026
## Changes

This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.

The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).

Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.

## Tests

* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
denik pushed a commit that referenced this pull request May 20, 2026
CLI:
* Add support for UC Volumes to the `databricks fs` commands
([#1209](#1209)).

Bundles:
* Use dynamic configuration model in bundles
([#1098](#1098)).
* Allow use of variables references in primitive non-string fields
([#1219](#1219)).
* Add an experimental default-sql template
([#1051](#1051)).
* Add an experimental dbt-sql template
([#1059](#1059)).

Internal:
* Add fork-user to winget release workflow
([#1214](#1214)).
* Use `any` as type for data sources and resources in `tf/schema`
([#1216](#1216)).
* Avoid infinite recursion when normalizing a recursive type
([#1213](#1213)).
* Fix issue where interpolating a new ref would rewrite unrelated fields
([#1217](#1217)).
* Use `dyn.Value` as input to generating Terraform JSON
([#1218](#1218)).

API Changes:
* Changed `databricks lakehouse-monitors update` command with new
required argument order.
 * Added `databricks online-tables` command group.

OpenAPI commit cdd76a98a4fca7008572b3a94427566dd286c63b (2024-02-19)
Dependency updates:
* Bump Terraform provider to v1.36.2
([#1215](#1215)).
* Bump github.com/databricks/databricks-sdk-go from 0.32.0 to 0.33.0
([#1222](#1222)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants